Fuzzy c-means clustering with prior biological knowledge
نویسندگان
چکیده
We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.
منابع مشابه
OPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM
This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...
متن کاملFuzzy C-Means Clustering Algorithm for Site Selection of Groundwater Artificial Recharge Areas (Case Study: Sefied Dasht Plain)
Artificial recharge can be an effective method to raise the groundwater table and to resolve the groundwater crisis in Sefid dasht plain. The most important step to successful accomplishment of artificial recharge is locating suitable areas for artificial recharge. Hence this research carried out with purpose of determining suitable areas for artificial recharge in Sefid dasht plain. Slope, sur...
متن کاملInformation technologies Attribute Weighted Optimization of Fuzzy C-Means Clustering Algorithm
According to the standard fuzzy C-means clustering algorithm performed poor in the clustering effect during the clustering process. This paper presents an objective function optimization based on the attribute weighted and the objective function optimization. Firstly, use a little prior knowledge as the labeled sample. These calibrated samples information are used as the prior knowledge, and th...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملFuzzy C-Means Clustering Algorithm for Site Selection of Groundwater Artificial Recharge Areas (Case Study: Sefied Dasht Plain)
Artificial recharge can be an effective method to raise the groundwater table and to resolve the groundwater crisis in Sefid dasht plain. The most important step to successful accomplishment of artificial recharge is locating suitable areas for artificial recharge. Hence this research carried out with purpose of determining suitable areas for artificial recharge in Sefid dasht plain. Slope, sur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of biomedical informatics
دوره 42 1 شماره
صفحات -
تاریخ انتشار 2009